Why do people keep parsing HTML using regex? [closed]

Posted by polygenelubricants on Stack Overflow See other posts from Stack Overflow or by polygenelubricants
Published on 2010-05-03T09:47:44Z Indexed on 2010/05/04 3:38 UTC
Read the original article Hit count: 187

As much as I love regular expressions, it's obvious to me that it's not the best tool for parsing HTML, especially given the numerous good HTML parsers out there.

And yet there are numerous questions on stackoverflow that attempts to parse HTML using regex. And people would always point out what a bad idea that is in the comments. And the accepted answer would often have a disclaimer how this isn't really the ideal way of doing things.

But based on the constant flow of questions, it still seems that people keep parsing HTML using regex, despite the perceived difficulty in reading and maintaining it (and that's putting correctness aside for now).

So my question is: why?

  • Is it because it's easy to learn?
  • Is it because it's faster?
  • Is it because it's the industry standard?
  • Is it because there are already so many reusable regexes to build from?
  • Is it because 100% correctness is never really the objective? (90% good enough?)
  • etc...

I'd also like to hear from the downvoters why they did so. Is it because:

  • There's absolutely nothing wrong with using regex to parse HTML and asking "Why?" is just dumb?
  • The premise of the question is flawed because the people who are using regex to parse HTML is such a small minority?

© Stack Overflow or respective owner

Related posts about regex

Related posts about best-practices